Search for: All records

Creators/Authors contains: "Pevzner, Pavel A."

« Prev Next »

Total Resources

10

Resource Type
Conference Paper

0

Conference Proceeding

0

Dataset

0

Journal Article

10

Workshop Report

0

Availability
Full Text / Resource Available

10

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Gene prediction in the immunoglobulin loci

https://doi.org/10.1101/gr.276676.122

Sirupurapu, Vikram ; Safonova, Yana ; Pevzner, Pavel A. ( May 2022 , Genome Research)

The V(D)J recombination process rearranges the variable (V), diversity (D), and joining (J) genes in the immunoglobulin (IG) loci to generate antibody repertoires. Annotation of these loci across various species and predicting the V, D, and J genes (IG genes) are critical for studies of the adaptive immune system. However, because the standard gene finding algorithms are not suitable for predicting IG genes, they have been semimanually annotated in very few species. We developed the IGDetective algorithm for predicting IG genes and applied it to species with the assembled IG loci. IGDetective generated the first large collection of IG genes across many species and enabled their evolutionary analysis, including the analysis of the “bat IG diversity” hypothesis. This analysis revealed extremely conserved V genes in evolutionary distant species, indicating that these genes may be subjected to the same selective pressure, for example, pressure driven by common pathogens. IGDetective also revealed extremely diverged V genes and a new family of evolutionary conserved V genes in bats with unusual noncanonical cysteines. Moreover, unlike all other previously reported antibodies, these cysteines are located within complementarity-determining regions. Because cysteines form disulfide bonds, we hypothesize that these cysteine-rich V genes might generate antibodies with noncanonical conformations and could potentially form a unique part of the immune repertoire in bats. We also analyzed the diversity landscape of the recombination signal sequences and revealed their features that trigger the high/low usage of the IG genes.
more » « less
Full Text Available
viralFlye: assembling viruses and identifying their hosts from long-read metagenomics data

https://doi.org/10.1186/s13059-021-02566-x

Antipov, Dmitry ; Rayko, Mikhail ; Kolmogorov, Mikhail ; Pevzner, Pavel A. ( February 2022 , Genome Biology)

Abstract
Although the use of long-read sequencing improves the contiguity of assembled viral genomes compared to short-read methods, assembling complex viral communities remains an open problem. We describe the viralFlye tool for identification and analysis of metagenome-assembled viruses in long-read assemblies. We show it significantly improves viral assemblies and demonstrate that long-reads result in a much larger array of predicted virus-host associations as compared to short-read assemblies. We demonstrate that the identification of novel CRISPR arrays in bacterial genomes from a newly assembled metagenomic sample provides information for predicting novel hosts for novel viruses.

more » « less
Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads

https://doi.org/10.1038/s41587-022-01220-6

Bankevich, Anton ; Bzikadze, Andrey V. ; Kolmogorov, Mikhail ; Antipov, Dmitry ; Pevzner, Pavel A. ( February 2022 , Nature Biotechnology)

Full Text Available
Analysis of metagenome-assembled viral genomes from the human gut reveals diverse putative CrAss-like phages with unique genomic features

https://doi.org/10.1038/s41467-021-21350-w

Yutin, Natalya ; Benler, Sean ; Shmakov, Sergei A. ; Wolf, Yuri I. ; Tolstoy, Igor ; Rayko, Mike ; Antipov, Dmitry ; Pevzner, Pavel A. ; Koonin, Eugene V. ( December 2021 , Nature Communications)
null (Ed.)
Abstract CrAssphage is the most abundant human-associated virus and the founding member of a large group of bacteriophages, discovered in animal-associated and environmental metagenomes, that infect bacteria of the phylum Bacteroidetes. We analyze 4907 Circular Metagenome Assembled Genomes (cMAGs) of putative viruses from human gut microbiomes and identify nearly 600 genomes of crAss-like phages that account for nearly 87% of the DNA reads mapped to these cMAGs. Phylogenetic analysis of conserved genes demonstrates the monophyly of crAss-like phages, a putative virus order, and of 5 branches, potential families within that order, two of which have not been identified previously. The phage genomes in one of these families are almost twofold larger than the crAssphage genome (145-192 kilobases), with high density of self-splicing introns and inteins. Many crAss-like phages encode suppressor tRNAs that enable read-through of UGA or UAG stop-codons, mostly, in late phage genes. A distinct feature of the crAss-like phages is the recurrent switch of the phage DNA polymerase type between A and B families. Thus, comparative genomic analysis of the expanded assemblage of crAss-like phages reveals aspects of genome architecture and expression as well as phage biology that were not apparent from the previous work on phage genomics.
more » « less
Full Text Available
metaFlye: scalable long-read metagenome assembly using repeat graphs

https://doi.org/10.1038/s41592-020-00971-x

Kolmogorov, Mikhail ; Bickhart, Derek M. ; Behsaz, Bahar ; Gurevich, Alexey ; Rayko, Mikhail ; Shin, Sung Bong ; Kuhn, Kristen ; Yuan, Jeffrey ; Polevikov, Evgeny ; Smith, Timothy P. ; et al ( November 2020 , Nature Methods)
null (Ed.)
Full Text Available
Plasmid detection and assembly in genomic and metagenomic data sets

https://doi.org/10.1101/gr.241299.118

Antipov, Dmitry ; Raiko, Mikhail ; Lapidus, Alla ; Pevzner, Pavel A. ( June 2019 , Genome Research)

Full Text Available
Assembly of long, error-prone reads using repeat graphs

https://doi.org/10.1038/s41587-019-0072-8

Kolmogorov, Mikhail ; Yuan, Jeffrey ; Lin, Yu ; Pevzner, Pavel A. ( April 2019 , Nature Biotechnology)

Full Text Available
Complete genomic and epigenetic maps of human centromeres

https://doi.org/10.1126/science.abl4178

Altemose, Nicolas ; Logsdon, Glennis A. ; Bzikadze, Andrey V. ; Sidhwani, Pragya ; Langley, Sasha A. ; Caldas, Gina V. ; Hoyt, Savannah J. ; Uralsky, Lev ; Ryabov, Fedor D. ; Shew, Colin J. ; et al ( April 2022 , Science)

INTRODUCTION To faithfully distribute genetic material to daughter cells during cell division, spindle fibers must couple to DNA by means of a structure called the kinetochore, which assembles at each chromosome’s centromere. Human centromeres are located within large arrays of tandemly repeated DNA sequences known as alpha satellite (αSat), which often span millions of base pairs on each chromosome. Arrays of αSat are frequently surrounded by other types of tandem satellite repeats, which have poorly understood functions, along with nonrepetitive sequences, including transcribed genes. Previous genome sequencing efforts have been unable to generate complete assemblies of satellite-rich regions because of their scale and repetitive nature, limiting the ability to study their organization, variation, and function. RATIONALE Pericentromeric and centromeric (peri/centromeric) satellite DNA sequences have remained almost entirely missing from the assembled human reference genome for the past 20 years. Using a complete, telomere-to-telomere (T2T) assembly of a human genome, we developed and deployed tailored computational approaches to reveal the organization and evolutionary patterns of these satellite arrays at both large and small length scales. We also performed experiments to map precisely which αSat repeats interact with kinetochore proteins. Last, we compared peri/centromeric regions among multiple individuals to understand how these sequences vary across diverse genetic backgrounds. RESULTS Satellite repeats constitute 6.2% of the T2T-CHM13 genome assembly, with αSat representing the single largest component (2.8% of the genome). By studying the sequence relationships of αSat repeats in detail across each centromere, we found genome-wide evidence that human centromeres evolve through “layered expansions.” Specifically, distinct repetitive variants arise within each centromeric region and expand through mechanisms that resemble successive tandem duplications, whereas older flanking sequences shrink and diverge over time. We also revealed that the most recently expanded repeats within each αSat array are more likely to interact with the inner kinetochore protein Centromere Protein A (CENP-A), which coincides with regions of reduced CpG methylation. This suggests a strong relationship between local satellite repeat expansion, kinetochore positioning, and DNA hypomethylation. Furthermore, we uncovered large and unexpected structural rearrangements that affect multiple satellite repeat types, including active centromeric αSat arrays. Last, by comparing sequence information from nearly 1600 individuals’ X chromosomes, we observed that individuals with recent African ancestry possess the greatest genetic diversity in the region surrounding the centromere, which sometimes contains a predominantly African αSat sequence variant. CONCLUSION The genetic and epigenetic properties of centromeres are closely interwoven through evolution. These findings raise important questions about the specific molecular mechanisms responsible for the relationship between inner kinetochore proteins, DNA hypomethylation, and layered αSat expansions. Even more questions remain about the function and evolution of non-αSat repeats. To begin answering these questions, we have produced a comprehensive encyclopedia of peri/centromeric sequences in a human genome, and we demonstrated how these regions can be studied with modern genomic tools. Our work also illuminates the rich genetic variation hidden within these formerly missing regions of the genome, which may contribute to health and disease. This unexplored variation underlines the need for more T2T human genome assemblies from genetically diverse individuals. Gapless assemblies illuminate centromere evolution. ( Top ) The organization of peri/centromeric satellite repeats. ( Bottom left ) A schematic portraying (i) evidence for centromere evolution through layered expansions and (ii) the localization of inner-kinetochore proteins in the youngest, most recently expanded repeats, which coincide with a region of DNA hypomethylation. ( Bottom right ) An illustration of the global distribution of chrX centromere haplotypes, showing increased diversity in populations with recent African ancestry.
more » « less
Full Text Available
Optimizing sequencing protocols for leaderboard metagenomics by combining long and short reads

https://doi.org/10.1186/s13059-019-1834-9

Sanders, Jon G. ; Nurk, Sergey ; Salido, Rodolfo A. ; Minich, Jeremiah ; Xu, Zhenjiang Z. ; Zhu, Qiyun ; Martino, Cameron ; Fedarko, Marcus ; Arthur, Timothy D. ; Chen, Feng ; et al ( December 2019 , Genome Biology)

Full Text Available
The complete sequence of a human genome

https://doi.org/10.1126/science.abj6987

Nurk, Sergey ; Koren, Sergey ; Rhie, Arang ; Rautiainen, Mikko ; Bzikadze, Andrey V. ; Mikheenko, Alla ; Vollger, Mitchell R. ; Altemose, Nicolas ; Uralsky, Lev ; Gershman, Ariel ; et al ( April 2022 , Science)

Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion–base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.
more » « less
Full Text Available